Overview

Brought to you by YData

Dataset statistics

Number of variables14
Number of observations3001
Missing cells2384
Missing cells (%)5.7%
Duplicate rows5
Duplicate rows (%)0.2%
Total size in memory328.4 KiB
Average record size in memory112.0 B

Variable types

Numeric7
Categorical6
Text1

Alerts

Dataset has 5 (0.2%) duplicate rowsDuplicates
comments is highly overall correlated with customerNumber and 7 other fieldsHigh correlation
customerNumber is highly overall correlated with commentsHigh correlation
employeeNumber is highly overall correlated with comments and 1 other fieldsHigh correlation
orderDate is highly overall correlated with comments and 2 other fieldsHigh correlation
orderNumber is highly overall correlated with commentsHigh correlation
origin is highly overall correlated with comments and 1 other fieldsHigh correlation
priceEach is highly overall correlated with sales_amountHigh correlation
quantityOrdered is highly overall correlated with sales_amountHigh correlation
requiredDate is highly overall correlated with comments and 2 other fieldsHigh correlation
sales_amount is highly overall correlated with priceEach and 1 other fieldsHigh correlation
shippedDate is highly overall correlated with comments and 2 other fieldsHigh correlation
status is highly overall correlated with commentsHigh correlation
orderDate is highly imbalanced (98.9%) Imbalance
shippedDate is highly imbalanced (95.9%) Imbalance
requiredDate is highly imbalanced (96.1%) Imbalance
status is highly imbalanced (78.8%) Imbalance
origin is highly imbalanced (73.2%) Imbalance
shippedDate has 142 (4.7%) missing values Missing
comments has 2242 (74.7%) missing values Missing
employeeNumber has 137 (4.6%) zeros Zeros

Reproduction

Analysis started2025-02-06 15:21:06.895701
Analysis finished2025-02-06 15:21:12.353255
Duration5.46 seconds
Software versionydata-profiling vv4.12.2
Download configurationconfig.json

Variables

orderNumber
Real number (ℝ)

High correlation 

Distinct326
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10260.509
Minimum10100
Maximum10425
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:12.450285image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum10100
5-th percentile10117
Q110181
median10263
Q310339
95-th percentile10407
Maximum10425
Range325
Interquartile range (IQR)158

Descriptive statistics

Standard deviation92.61975
Coefficient of variation (CV)0.0090268181
Kurtosis-1.1828136
Mean10260.509
Median Absolute Deviation (MAD)80
Skewness0.013978109
Sum30791788
Variance8578.418
MonotonicityIncreasing
2025-02-06T16:21:12.573906image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10360 18
 
0.6%
10168 18
 
0.6%
10386 18
 
0.6%
10222 18
 
0.6%
10275 18
 
0.6%
10106 18
 
0.6%
10165 18
 
0.6%
10398 18
 
0.6%
10332 18
 
0.6%
10159 18
 
0.6%
Other values (316) 2821
94.0%
ValueCountFrequency (%)
10100 4
 
0.1%
10101 4
 
0.1%
10102 2
 
0.1%
10103 16
0.5%
10104 14
0.5%
10105 15
0.5%
10106 18
0.6%
10107 8
0.3%
10108 16
0.5%
10109 6
 
0.2%
ValueCountFrequency (%)
10425 14
0.5%
10424 6
 
0.2%
10423 5
 
0.2%
10422 2
 
0.1%
10421 2
 
0.1%
10420 13
0.4%
10419 15
0.5%
10418 9
0.3%
10417 6
 
0.2%
10416 14
0.5%

orderLineNumber
Real number (ℝ)

Distinct18
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.4245252
Minimum1
Maximum18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:12.683925image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile14
Maximum18
Range17
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.1968701
Coefficient of variation (CV)0.65325763
Kurtosis-0.5203634
Mean6.4245252
Median Absolute Deviation (MAD)3
Skewness0.6036522
Sum19280
Variance17.613718
MonotonicityNot monotonic
2025-02-06T16:21:12.794853image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
1 327
10.9%
2 311
10.4%
3 288
9.6%
4 273
9.1%
5 256
8.5%
6 239
8.0%
7 212
 
7.1%
8 201
 
6.7%
9 177
 
5.9%
10 149
 
5.0%
Other values (8) 568
18.9%
ValueCountFrequency (%)
1 327
10.9%
2 311
10.4%
3 288
9.6%
4 273
9.1%
5 256
8.5%
6 239
8.0%
7 212
7.1%
8 201
6.7%
9 177
5.9%
10 149
5.0%
ValueCountFrequency (%)
18 11
 
0.4%
17 26
 
0.9%
16 43
 
1.4%
15 57
 
1.9%
14 82
2.7%
13 101
3.4%
12 114
3.8%
11 134
4.5%
10 149
5.0%
9 177
5.9%

orderDate
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.6 KiB
0000-00-00
2998 
2038-09-00
 
3

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters30010
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0000-00-00
2nd row0000-00-00
3rd row0000-00-00
4th row0000-00-00
5th row0000-00-00

Common Values

ValueCountFrequency (%)
0000-00-00 2998
99.9%
2038-09-00 3
 
0.1%

Length

2025-02-06T16:21:12.901629image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-02-06T16:21:13.003131image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0000-00-00 2998
99.9%
2038-09-00 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 23996
80.0%
- 6002
 
20.0%
2 3
 
< 0.1%
3 3
 
< 0.1%
8 3
 
< 0.1%
9 3
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 23996
80.0%
- 6002
 
20.0%
2 3
 
< 0.1%
3 3
 
< 0.1%
8 3
 
< 0.1%
9 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 23996
80.0%
- 6002
 
20.0%
2 3
 
< 0.1%
3 3
 
< 0.1%
8 3
 
< 0.1%
9 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 23996
80.0%
- 6002
 
20.0%
2 3
 
< 0.1%
3 3
 
< 0.1%
8 3
 
< 0.1%
9 3
 
< 0.1%

shippedDate
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)0.1%
Missing142
Missing (%)4.7%
Memory size23.6 KiB
0000-00-00
2839 
2038-00-06
 
17
2038-09-07
 
3

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters28590
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0000-00-00
2nd row0000-00-00
3rd row0000-00-00
4th row0000-00-00
5th row0000-00-00

Common Values

ValueCountFrequency (%)
0000-00-00 2839
94.6%
2038-00-06 17
 
0.6%
2038-09-07 3
 
0.1%
(Missing) 142
 
4.7%

Length

2025-02-06T16:21:13.090881image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-02-06T16:21:13.175985image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0000-00-00 2839
99.3%
2038-00-06 17
 
0.6%
2038-09-07 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 22789
79.7%
- 5718
 
20.0%
2 20
 
0.1%
3 20
 
0.1%
8 20
 
0.1%
6 17
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 28590
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 22789
79.7%
- 5718
 
20.0%
2 20
 
0.1%
3 20
 
0.1%
8 20
 
0.1%
6 17
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 28590
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 22789
79.7%
- 5718
 
20.0%
2 20
 
0.1%
3 20
 
0.1%
8 20
 
0.1%
6 17
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 28590
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 22789
79.7%
- 5718
 
20.0%
2 20
 
0.1%
3 20
 
0.1%
8 20
 
0.1%
6 17
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

requiredDate
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.6 KiB
0000-00-00
2981 
2038-00-08
 
17
2038-09-07
 
3

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters30010
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0000-00-00
2nd row0000-00-00
3rd row0000-00-00
4th row0000-00-00
5th row0000-00-00

Common Values

ValueCountFrequency (%)
0000-00-00 2981
99.3%
2038-00-08 17
 
0.6%
2038-09-07 3
 
0.1%

Length

2025-02-06T16:21:13.270155image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-02-06T16:21:13.361090image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0000-00-00 2981
99.3%
2038-00-08 17
 
0.6%
2038-09-07 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 23925
79.7%
- 6002
 
20.0%
8 37
 
0.1%
2 20
 
0.1%
3 20
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 23925
79.7%
- 6002
 
20.0%
8 37
 
0.1%
2 20
 
0.1%
3 20
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 23925
79.7%
- 6002
 
20.0%
8 37
 
0.1%
2 20
 
0.1%
3 20
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30010
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 23925
79.7%
- 6002
 
20.0%
8 37
 
0.1%
2 20
 
0.1%
3 20
 
0.1%
9 3
 
< 0.1%
7 3
 
< 0.1%

customerNumber
Real number (ℝ)

High correlation 

Distinct98
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean259.63912
Minimum103
Maximum496
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:13.465389image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum103
5-th percentile121
Q1145
median240
Q3353
95-th percentile473
Maximum496
Range393
Interquartile range (IQR)208

Descriptive statistics

Standard deviation118.40343
Coefficient of variation (CV)0.4560308
Kurtosis-1.0888518
Mean259.63912
Median Absolute Deviation (MAD)99
Skewness0.45561081
Sum779177
Variance14019.373
MonotonicityNot monotonic
2025-02-06T16:21:13.594294image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141 260
 
8.7%
124 180
 
6.0%
114 55
 
1.8%
119 54
 
1.8%
187 51
 
1.7%
131 49
 
1.6%
496 48
 
1.6%
278 48
 
1.6%
151 48
 
1.6%
282 46
 
1.5%
Other values (88) 2162
72.0%
ValueCountFrequency (%)
103 7
 
0.2%
112 29
 
1.0%
114 55
 
1.8%
119 54
 
1.8%
121 32
 
1.1%
124 180
6.0%
128 22
 
0.7%
129 21
 
0.7%
131 49
 
1.6%
141 260
8.7%
ValueCountFrequency (%)
496 48
1.6%
495 18
 
0.6%
489 12
 
0.4%
487 15
 
0.5%
486 23
0.8%
484 15
 
0.5%
475 13
 
0.4%
473 8
 
0.3%
471 23
0.8%
462 26
0.9%

employeeNumber
Real number (ℝ)

High correlation  Zeros 

Distinct15
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1317.9487
Minimum0
Maximum1702
Zeros137
Zeros (%)4.6%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:13.704574image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1165
Q11216
median1370
Q31501
95-th percentile1612
Maximum1702
Range1702
Interquartile range (IQR)285

Descriptive statistics

Standard deviation326.34357
Coefficient of variation (CV)0.24761478
Kurtosis9.347836
Mean1317.9487
Median Absolute Deviation (MAD)134
Skewness-2.8477689
Sum3955164
Variance106500.13
MonotonicityNot monotonic
2025-02-06T16:21:13.801248image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
1370 398
13.3%
1165 331
11.0%
1401 273
9.1%
1501 236
 
7.9%
1504 220
 
7.3%
1323 212
 
7.1%
1612 186
 
6.2%
1611 185
 
6.2%
1337 177
 
5.9%
1216 152
 
5.1%
Other values (5) 631
21.0%
ValueCountFrequency (%)
0 137
 
4.6%
1165 331
11.0%
1166 114
 
3.8%
1188 124
 
4.1%
1216 152
 
5.1%
1286 142
 
4.7%
1323 212
7.1%
1337 177
5.9%
1370 398
13.3%
1401 273
9.1%
ValueCountFrequency (%)
1702 114
 
3.8%
1612 186
6.2%
1611 185
6.2%
1504 220
7.3%
1501 236
7.9%
1401 273
9.1%
1370 398
13.3%
1337 177
5.9%
1323 212
7.1%
1286 142
 
4.7%
Distinct109
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:14.040004image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length8.1102966
Min length8

Characters and Unicode

Total characters24339
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS24_3969
2nd rowS18_2248
3rd rowS18_1749
4th rowS18_4409
5th rowS18_2795
ValueCountFrequency (%)
s18_3232 53
 
1.8%
s18_3136 29
 
1.0%
s18_2238 29
 
1.0%
s24_2300 28
 
0.9%
s18_2319 28
 
0.9%
s50_1392 28
 
0.9%
s32_1268 28
 
0.9%
s18_1097 28
 
0.9%
s18_4668 28
 
0.9%
s12_1666 28
 
0.9%
Other values (99) 2694
89.8%
2025-02-06T16:21:14.380897image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 3414
14.0%
1 3030
12.4%
S 3001
12.3%
_ 3001
12.3%
4 2146
8.8%
8 2143
8.8%
0 1899
7.8%
3 1827
7.5%
7 1149
 
4.7%
9 1003
 
4.1%
Other values (2) 1726
7.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 24339
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 3414
14.0%
1 3030
12.4%
S 3001
12.3%
_ 3001
12.3%
4 2146
8.8%
8 2143
8.8%
0 1899
7.8%
3 1827
7.5%
7 1149
 
4.7%
9 1003
 
4.1%
Other values (2) 1726
7.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 24339
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 3414
14.0%
1 3030
12.4%
S 3001
12.3%
_ 3001
12.3%
4 2146
8.8%
8 2143
8.8%
0 1899
7.8%
3 1827
7.5%
7 1149
 
4.7%
9 1003
 
4.1%
Other values (2) 1726
7.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 24339
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 3414
14.0%
1 3030
12.4%
S 3001
12.3%
_ 3001
12.3%
4 2146
8.8%
8 2143
8.8%
0 1899
7.8%
3 1827
7.5%
7 1149
 
4.7%
9 1003
 
4.1%
Other values (2) 1726
7.1%

status
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size23.6 KiB
Shipped
2775 
Cancelled
 
79
Resolved
 
47
On Hold
 
44
In Process
 
42

Length

Max length10
Median length7
Mean length7.1149617
Min length7

Characters and Unicode

Total characters21352
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowShipped
2nd rowShipped
3rd rowShipped
4th rowShipped
5th rowShipped

Common Values

ValueCountFrequency (%)
Shipped 2775
92.5%
Cancelled 79
 
2.6%
Resolved 47
 
1.6%
On Hold 44
 
1.5%
In Process 42
 
1.4%
Disputed 14
 
0.5%

Length

2025-02-06T16:21:14.500967image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-02-06T16:21:14.599276image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
shipped 2775
89.9%
cancelled 79
 
2.6%
resolved 47
 
1.5%
on 44
 
1.4%
hold 44
 
1.4%
in 42
 
1.4%
process 42
 
1.4%
disputed 14
 
0.5%

Most occurring characters

ValueCountFrequency (%)
p 5564
26.1%
e 3083
14.4%
d 2959
13.9%
i 2789
13.1%
S 2775
13.0%
h 2775
13.0%
l 249
 
1.2%
n 165
 
0.8%
s 145
 
0.7%
o 133
 
0.6%
Other values (14) 715
 
3.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 5564
26.1%
e 3083
14.4%
d 2959
13.9%
i 2789
13.1%
S 2775
13.0%
h 2775
13.0%
l 249
 
1.2%
n 165
 
0.8%
s 145
 
0.7%
o 133
 
0.6%
Other values (14) 715
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 5564
26.1%
e 3083
14.4%
d 2959
13.9%
i 2789
13.1%
S 2775
13.0%
h 2775
13.0%
l 249
 
1.2%
n 165
 
0.8%
s 145
 
0.7%
o 133
 
0.6%
Other values (14) 715
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 5564
26.1%
e 3083
14.4%
d 2959
13.9%
i 2789
13.1%
S 2775
13.0%
h 2775
13.0%
l 249
 
1.2%
n 165
 
0.8%
s 145
 
0.7%
o 133
 
0.6%
Other values (14) 715
 
3.3%

comments
Categorical

High correlation  Missing 

Distinct37
Distinct (%)4.9%
Missing2242
Missing (%)74.7%
Memory size23.6 KiB
They want to reevaluate their terms agreement with Finance.
70 
Customer requested that DHL is used for this shipping
64 
Customer requested that FedEx Ground is used for this shipping
 
43
Custom shipping instructions sent to warehouse
 
38
Customer credit limit exceeded. Will ship when a payment is received.
 
38
Other values (32)
506 

Length

Max length189
Median length123
Mean length83.646904
Min length22

Characters and Unicode

Total characters63488
Distinct characters57
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCheck on availability.
2nd rowCheck on availability.
3rd rowCheck on availability.
4th rowCheck on availability.
5th rowDifficult to negotiate with customer. We need more marketing materials

Common Values

ValueCountFrequency (%)
They want to reevaluate their terms agreement with Finance. 70
 
2.3%
Customer requested that DHL is used for this shipping 64
 
2.1%
Customer requested that FedEx Ground is used for this shipping 43
 
1.4%
Custom shipping instructions sent to warehouse 38
 
1.3%
Customer credit limit exceeded. Will ship when a payment is received. 38
 
1.3%
Customer very concerned about the exact color of the models. There is high risk that he may dispute the order because there is a slight color mismatch 35
 
1.2%
Check on availability. 34
 
1.1%
Can we deliver the new Ford Mustang models by end-of-quarter? 33
 
1.1%
Customer requested special shippment. The instructions were passed along to the warehouse 32
 
1.1%
Cautious optimism. We have happy customers here, if we can keep them well stocked. I need all the information I can get on the planned shippments of Porches 28
 
0.9%
Other values (27) 344
 
11.5%
(Missing) 2242
74.7%

Length

2025-02-06T16:21:14.929114image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 557
 
5.3%
customer 478
 
4.6%
is 289
 
2.8%
to 284
 
2.7%
this 255
 
2.4%
of 241
 
2.3%
we 226
 
2.2%
with 198
 
1.9%
their 186
 
1.8%
that 177
 
1.7%
Other values (216) 7529
72.3%

Most occurring characters

ValueCountFrequency (%)
9694
15.3%
e 7457
 
11.7%
t 4787
 
7.5%
s 4010
 
6.3%
i 3574
 
5.6%
o 3411
 
5.4%
r 3324
 
5.2%
a 3089
 
4.9%
h 2769
 
4.4%
n 2717
 
4.3%
Other values (47) 18656
29.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 63488
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9694
15.3%
e 7457
 
11.7%
t 4787
 
7.5%
s 4010
 
6.3%
i 3574
 
5.6%
o 3411
 
5.4%
r 3324
 
5.2%
a 3089
 
4.9%
h 2769
 
4.4%
n 2717
 
4.3%
Other values (47) 18656
29.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 63488
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9694
15.3%
e 7457
 
11.7%
t 4787
 
7.5%
s 4010
 
6.3%
i 3574
 
5.6%
o 3411
 
5.4%
r 3324
 
5.2%
a 3089
 
4.9%
h 2769
 
4.4%
n 2717
 
4.3%
Other values (47) 18656
29.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 63488
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9694
15.3%
e 7457
 
11.7%
t 4787
 
7.5%
s 4010
 
6.3%
i 3574
 
5.6%
o 3411
 
5.4%
r 3324
 
5.2%
a 3089
 
4.9%
h 2769
 
4.4%
n 2717
 
4.3%
Other values (47) 18656
29.4%

quantityOrdered
Real number (ℝ)

High correlation 

Distinct61
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.211929
Minimum6
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:15.053399image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile21
Q127
median35
Q343
95-th percentile49
Maximum97
Range91
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.8289574
Coefficient of variation (CV)0.27913715
Kurtosis0.69438677
Mean35.211929
Median Absolute Deviation (MAD)8
Skewness0.42453422
Sum105671
Variance96.608404
MonotonicityNot monotonic
2025-02-06T16:21:15.168750image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34 122
 
4.1%
31 108
 
3.6%
21 108
 
3.6%
46 104
 
3.5%
33 104
 
3.5%
27 104
 
3.5%
45 100
 
3.3%
29 100
 
3.3%
41 100
 
3.3%
48 99
 
3.3%
Other values (51) 1952
65.0%
ValueCountFrequency (%)
6 2
 
0.1%
10 3
 
0.1%
11 2
 
0.1%
12 1
 
< 0.1%
13 1
 
< 0.1%
15 4
 
0.1%
16 1
 
< 0.1%
18 3
 
0.1%
19 2
 
0.1%
20 96
3.2%
ValueCountFrequency (%)
97 1
 
< 0.1%
90 1
 
< 0.1%
85 1
 
< 0.1%
77 2
 
0.1%
76 3
0.1%
70 2
 
0.1%
66 5
0.2%
65 2
 
0.1%
64 4
0.1%
62 1
 
< 0.1%

priceEach
Real number (ℝ)

High correlation 

Distinct1573
Distinct (%)52.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.765831
Minimum26.55
Maximum214.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:15.282902image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum26.55
5-th percentile38.98
Q162
median85.76
Q3114.65
95-th percentile158.63
Maximum214.3
Range187.75
Interquartile range (IQR)52.65

Descriptive statistics

Standard deviation36.579368
Coefficient of variation (CV)0.40300813
Kurtosis0.081135768
Mean90.765831
Median Absolute Deviation (MAD)25.76
Skewness0.64302467
Sum272388.26
Variance1338.0502
MonotonicityNot monotonic
2025-02-06T16:21:15.409795image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
98.48 8
 
0.3%
117.48 8
 
0.3%
56.55 7
 
0.2%
89.38 6
 
0.2%
62 6
 
0.2%
153 6
 
0.2%
43.27 6
 
0.2%
67.03 6
 
0.2%
120.53 6
 
0.2%
137.17 6
 
0.2%
Other values (1563) 2936
97.8%
ValueCountFrequency (%)
26.55 2
0.1%
27.22 1
 
< 0.1%
27.55 1
 
< 0.1%
27.88 4
0.1%
28.64 2
0.1%
28.88 2
0.1%
29.21 1
 
< 0.1%
29.35 2
0.1%
29.54 1
 
< 0.1%
29.87 4
0.1%
ValueCountFrequency (%)
214.3 3
0.1%
212.16 1
 
< 0.1%
210.01 1
 
< 0.1%
207.87 1
 
< 0.1%
207.8 1
 
< 0.1%
205.73 4
0.1%
205.72 2
0.1%
203.64 1
 
< 0.1%
203.59 2
0.1%
201.57 2
0.1%

sales_amount
Real number (ℝ)

High correlation 

Distinct2885
Distinct (%)96.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3204.9084
Minimum481.5
Maximum11503.14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.6 KiB
2025-02-06T16:21:15.536396image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum481.5
5-th percentile1166.16
Q11988.7
median2880.48
Q34093.6
95-th percentile6366
Maximum11503.14
Range11021.64
Interquartile range (IQR)2104.9

Descriptive statistics

Standard deviation1631.357
Coefficient of variation (CV)0.50901828
Kurtosis1.5215921
Mean3204.9084
Median Absolute Deviation (MAD)1012.5
Skewness1.1049456
Sum9617930.2
Variance2661325.6
MonotonicityNot monotonic
2025-02-06T16:21:15.657508image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2462 3
 
0.1%
4212 3
 
0.1%
3357.75 2
 
0.1%
3213 2
 
0.1%
1679.92 2
 
0.1%
1584.44 2
 
0.1%
2808.96 2
 
0.1%
2434.25 2
 
0.1%
1928.75 2
 
0.1%
3264 2
 
0.1%
Other values (2875) 2979
99.3%
ValueCountFrequency (%)
481.5 1
< 0.1%
529.35 1
< 0.1%
531 1
< 0.1%
546.66 1
< 0.1%
553.52 1
< 0.1%
557.6 1
< 0.1%
577.6 1
< 0.1%
597.4 1
< 0.1%
615 1
< 0.1%
625.5 1
< 0.1%
ValueCountFrequency (%)
11503.14 1
< 0.1%
11170.52 1
< 0.1%
10723.6 1
< 0.1%
10460.16 1
< 0.1%
10286.4 1
< 0.1%
10072 1
< 0.1%
9974.4 1
< 0.1%
9712.04 1
< 0.1%
9571.08 1
< 0.1%
9568.73 1
< 0.1%

origin
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.6 KiB
spain
2864 
japan
 
137

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters15005
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspain
2nd rowspain
3rd rowspain
4th rowspain
5th rowspain

Common Values

ValueCountFrequency (%)
spain 2864
95.4%
japan 137
 
4.6%

Length

2025-02-06T16:21:15.769167image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-02-06T16:21:15.867699image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
spain 2864
95.4%
japan 137
 
4.6%

Most occurring characters

ValueCountFrequency (%)
a 3138
20.9%
p 3001
20.0%
n 3001
20.0%
s 2864
19.1%
i 2864
19.1%
j 137
 
0.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 15005
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 3138
20.9%
p 3001
20.0%
n 3001
20.0%
s 2864
19.1%
i 2864
19.1%
j 137
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 15005
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 3138
20.9%
p 3001
20.0%
n 3001
20.0%
s 2864
19.1%
i 2864
19.1%
j 137
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 15005
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 3138
20.9%
p 3001
20.0%
n 3001
20.0%
s 2864
19.1%
i 2864
19.1%
j 137
 
0.9%

Interactions

2025-02-06T16:21:11.350425image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:07.552549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.216929image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.787152image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.396156image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.044007image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.622324image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.427289image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:07.723258image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.294928image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.872400image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.490062image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.129861image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.708636image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.502055image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:07.807674image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.374674image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.958751image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.579829image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.218737image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.784479image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.593172image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:07.899057image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.463812image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.043367image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.664147image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.310424image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.869019image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.670547image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:07.981358image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.553285image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.127489image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.763849image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.389121image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.952112image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.745986image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.053892image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.629311image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.203640image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.846204image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.457685image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.170635image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.828752image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.134337image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:08.710189image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.293977image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:09.943761image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:10.537001image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-02-06T16:21:11.261601image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-02-06T16:21:15.933514image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
commentscustomerNumberemployeeNumberorderDateorderLineNumberorderNumberoriginpriceEachquantityOrderedrequiredDatesales_amountshippedDatestatus
comments1.0000.6720.7081.0000.0000.6820.7740.0330.2051.0000.0911.0000.979
customerNumber0.6721.0000.1510.053-0.040-0.0100.364-0.0240.0130.128-0.0050.1290.148
employeeNumber0.7080.1511.0000.1400.0030.0660.999-0.004-0.0340.262-0.0220.2620.104
orderDate1.0000.0530.1401.0000.0000.0900.1180.0780.0001.0000.0901.0000.000
orderLineNumber0.000-0.0400.0030.0001.000-0.0430.000-0.008-0.0230.000-0.0180.0000.000
orderNumber0.682-0.0100.0660.090-0.0431.0000.170-0.0040.0530.1650.0300.1670.320
origin0.7740.3640.9990.1180.0000.1701.0000.0380.0000.3740.0000.3730.047
priceEach0.033-0.024-0.0040.078-0.008-0.0040.0381.0000.0220.0550.8190.0560.000
quantityOrdered0.2050.013-0.0340.000-0.0230.0530.0000.0221.0000.0000.5630.0000.182
requiredDate1.0000.1280.2621.0000.0000.1650.3740.0550.0001.0000.0661.0000.000
sales_amount0.091-0.005-0.0220.090-0.0180.0300.0000.8190.5630.0661.0000.0680.047
shippedDate1.0000.1290.2621.0000.0000.1670.3730.0560.0001.0000.0681.0000.000
status0.9790.1480.1040.0000.0000.3200.0470.0000.1820.0000.0470.0001.000

Missing values

2025-02-06T16:21:11.964843image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-02-06T16:21:12.159697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-02-06T16:21:12.295922image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

orderNumberorderLineNumberorderDateshippedDaterequiredDatecustomerNumberemployeeNumberproductCodestatuscommentsquantityOrderedpriceEachsales_amountorigin
01010010000-00-000000-00-000000-00-003631216S24_3969ShippedNaN4935.291729.21spain
11010020000-00-000000-00-000000-00-003631216S18_2248ShippedNaN5055.092754.50spain
21010030000-00-000000-00-000000-00-003631216S18_1749ShippedNaN30136.004080.00spain
31010040000-00-000000-00-000000-00-003631216S18_4409ShippedNaN2275.461660.12spain
41010110000-00-000000-00-000000-00-001281504S18_2795ShippedCheck on availability.26167.064343.56spain
51010120000-00-000000-00-000000-00-001281504S24_2022ShippedCheck on availability.4644.352040.10spain
61010130000-00-000000-00-000000-00-001281504S24_1937ShippedCheck on availability.4532.531463.85spain
71010140000-00-000000-00-000000-00-001281504S18_2325ShippedCheck on availability.25108.062701.50spain
81010210000-00-000000-00-000000-00-001811286S18_1367ShippedNaN4143.131768.33spain
91010220000-00-000000-00-000000-00-001811286S18_1342ShippedNaN3995.553726.45spain
orderNumberorderLineNumberorderDateshippedDaterequiredDatecustomerNumberemployeeNumberproductCodestatuscommentsquantityOrderedpriceEachsales_amountorigin
29911042540000-00-00NaN0000-00-001191370S12_4473In ProcessNaN3395.993167.67spain
29921042550000-00-00NaN0000-00-001191370S24_2840In ProcessNaN3131.82986.42spain
29931042560000-00-00NaN0000-00-001191370S32_2509In ProcessNaN1150.32553.52spain
29941042570000-00-00NaN0000-00-001191370S18_2319In ProcessNaN38117.824477.16spain
29951042580000-00-00NaN0000-00-001191370S18_3232In ProcessNaN28140.553935.40spain
29961042590000-00-00NaN0000-00-001191370S24_2300In ProcessNaN49127.796261.71spain
299710425100000-00-00NaN0000-00-001191370S18_2432In ProcessNaN1948.62923.78spain
299810425110000-00-00NaN0000-00-001191370S32_1268In ProcessNaN4183.793435.39spain
299910425120000-00-00NaN0000-00-001191370S10_4962In ProcessNaN38131.494996.62spain
300010425130000-00-00NaN0000-00-001191370S18_4600In ProcessNaN38107.764094.88spain

Duplicate rows

Most frequently occurring

orderNumberorderLineNumberorderDateshippedDaterequiredDatecustomerNumberemployeeNumberproductCodestatuscommentsquantityOrderedpriceEachsales_amountorigin# duplicates
01010420000-00-000000-00-000000-00-001411370S50_1514ShippedNaN3253.311705.92spain2
11041020000-00-000000-00-000000-00-003571612S18_3136ShippedNaN3484.822883.88spain2
21041360000-00-000000-00-000000-00-001751323S32_3207ShippedCustomer requested that DHL is used for this shipping2456.551357.20spain2
31041910000-00-000000-00-000000-00-003821401S18_1589ShippedNaN37100.803729.60spain2
41042530000-00-00NaN0000-00-001191370S18_2238In ProcessNaN28147.364126.08spain2